EMDL_m6Am: identifying N6,2′-O-dimethyladenosine sites based on stacking ensemble deep learning

Background N6, 2'-O-dimethyladenosine (m6Am) is an abundant RNA methylation modification on vertebrate mRNAs and is present in the transcription initiation region of mRNAs. It has recently been experimentally shown to be associated with several human disorders, including obesity genes, and stomach cancer, among others. As a result, N6,2′-O-dimethyladenosine (m6Am) site will play a crucial part in the regulation of RNA if it can be correctly identified. Results This study proposes a novel deep learning-based m6Am prediction model, EMDL_m6Am, which employs one-hot encoding to expressthe feature map of the RNA sequence and recognizes m6Am sites by integrating different CNN models via stacking. Including DenseNet, Inflated Convolutional Network (DCNN) and Deep Multiscale Residual Network (MSRN), the sensitivity (Sn), specificity (Sp), accuracy (ACC), Mathews correlation coefficient (MCC) and area under the curve (AUC) of our model on the training data set reach 86.62%, 88.94%, 87.78%, 0.7590 and 0.8778, respectively, and the prediction results on the independent test set are as high as 82.25%, 79.72%, 80.98%, 0.6199, and 0.8211. Conclusions In conclusion, the experimental results demonstrated that EMDL_m6Am greatly improved the predictive performance of the m6Am sites and could provide a valuable reference for the next part of the study. The source code and experimental data are available at: https://github.com/13133989982/EMDL-m6Am.


Introduction
In recent years, dynamic epigenetic changes of RNA have drawn considerable attention in biological study.There are already more than 160 different forms of RNA modifications known [1].According to reports, the majority of RNA modification enzyme mutations have an important role in the emergence of human disorders [2].In rat messenger RNA (mRNA), the N6-methyladenosine (m 6 A) modifications were discovered [3], while N6-2′-O-methyladenosine (m 6 Am) alterations were discovered shortly [4].One of the most prevalent post-transcriptional modifications of mRNA is N6,2′-O-methyladenosine (m 6 Am), which was found on the second base of the modification close to the m7G cap [4].In the meantime, m 6 Am, a terminal alteration generally 2′-O-methylated at the second base close to the 50 cap in mRNA, together with further methylation at the N6 position, has just recently been Identified as a promising target for FTO removal from the human obesity gene [5], associated with obesity [6].
Through the experimental studies, researchers have been able to uncover more and more hidden characteristics of m 6 Am.Recent research has gradually revealed the importance of m 6 Am in biological functions, including in the field of cardiac biology [7] research, gene expression regulation [8], tumor development, and more [9].For example, PCIF1 and m 6 Am were experimentally proven to be essential in the development of gastric cancer by Zhuo et al. [10].The progression of gastric cancer is accelerated by increased methyltransferases driven by m 6 Am methylation changes (PCIF1).By providing resistance to DCP2-mediated mRNA capping, M 6 Am confers mRNA [11].Furthermore, m 6 Am may also control other processes involving the metabolism of RNA.The experimental findings of Jan Mauer et al. [11] demonstrated that m 6 Am plays a significant role in the stability [11][12][13] and translation [14] function of mRNA.Overall, research on m 6 Am's biological effects is still in its early stages, and its primary functions are yet unknown.There are numerous wet-lab experimentation techniques available.As an illustration, Hawley et al. [15] developed miCLIP, a mapping of m 6 A and m 6 Am at single nucleotide precision, to find the m 6 Am sites.Additionally, m6ACE-seq was introduced by Koh et al. [16] to statistically map m 6 A and m 6 Am across the transcriptome.There is also the MeRIP-seq (m 6 A-seq), which has limited resolution, antibody cross-reactivity, and inability to differentiate between cap-m 6 Am and m 6 A in 50 RNA fragments.
Many other researchers are still striving to find m 6 Am sites using wet experimental approaches nowadays, including Sun et al. [17], they used antibodies to m 6 Am to recognize m 6 Am, but there was a limitation that they cannot precisely distinguish between m 6 Am and 5′-UTR m 6 A. Because the existing wet experimental techniques are expensive and time-consuming, it is vital to develop new computational methods for the exact identification of m 6 Am sites.
Despite the fact that prediction of m 6 Am sites have not been studied for a long time, a number of prediction techniques have been developed.The first one bases its prediction identification on conventional machine learning model.Based on RNA sequence and employing electron-ion interaction and pseudo-EIIP (PseEIIP) coding to predict sites, Jiang et al. [18] developed a predictor called m6AmPred, and the team continued to identify m 6 Am sites computationally using deep learning algorithms with proposing a new predictor called MultiRM [19].As deep learning framework, based on the LSTMattention mechanism combined with Word2vec embedding module, can perform multitag prediction, including prediction of m 6 A, m 6 Am, and other 12 RNA modification sites with powerful features.
Recently, Luo et al. [20] used a different coding approach and model than Jiang to predict m 6 Am.They used three coding approaches-one-hot coding, nucleotide chemical properties, and nucleotide density-as well as three base classifiers-multi-headed attention, two parallel embedding modules, CNN and BiLSTM, and a prediction module for m 6 Am sites.As a whole, the computational identification of m 6 Am sites has made some strides, that the algorithms now in use can predict m 6 Am sites with a high level of performance and accuracy.However, there is still a lot that can be done to increase the precision of these predictors for m 6 Am sites identification.
Based on the above-mentioned consideration, we propose EMDL_m6Am, a stacking ensemble deep learning model, in response to the current mission of identifying m 6 Am sites.It ismotivated by those m 6 Am data and the certain sites prediction models.The whole flowchart is displayed in Fig. 1.The model consists of an encoding module for features and a feature extraction module.The raw data is initially encoded in one step.Then, three deep learning models-A, B, and C-are divided up into the feature extraction module, and stacking integration is added later.Conv1D is for a one-dimensional convolutional layer.Avepooling1D stands for an average pooling layer, a fully connected layer is denoted Dense for, and BN for batch normalization.
As shown in Fig. 1, first, we attempted to use a number of popular RNA sequence coding techniques, such as one-hot, nucleotide chemical property (NCP), and nucleotide density (ND), both singly and in combination.Through the prediction performance obtained from the experiment, we finally selected the most straightforward, effective coding technique, one-hot, for RNA sequence coding.Second, we examined the performance of different deep learning predictors, including: DenseNet [21,22], Inflated Convolutional Network and Deep Multiscale Residual Network(MSRN) [23] and Bidirectional gating recurrent unit (BiGRU), and got some inspirations from them.Through carefully testing, we discovered that the DenseNet network based on one-hot encoding performs well.DenseNet still has drawbacks when compared with the currently state-of-the-art predictor DLm6Am, though.Hence, giving that the performance gaps amongst the three deep learning classifiers are not very large, we built a stacking ensemble deep learning model and used a straightforward logistic regression model for prediction in the second layer of stacking.Three deep learning models are employed in the first layer of stacking to forecast the original data to obtain the prediction results, and the second layer filters the first layer's prediction outcomes.Notably, we optimized the input of the second layer of the stacking model in this work, i.e., the output values of the first layer were stitched with the original dataset to serve as the input of the second layer, in order to avoid losing the original feature map information, which was inspired by the stacking prediction model from Jia et al. [24].Finally, the code is obtainable in the Github repository (https:// github.com/ 13133 989982/ EMDL-m6Am).

Benchmark dataset
Lately, Sun et al. [17] proposed a new sequencing approach, m6Am-seq, which can effectively distinguish m 6 Am from m 6 A using RNA immunoprecipitation and selective external demethylation.From this, they used this sequencing approach to provide 2166 m 6 Am sites in the entire human transcriptome at mononuclear resolution with a high confidence level for these sites.Subsequently, luo et al. [20] did a three-step process on the sites information provided by m6Am-seq: first, a sample sequence was extracted using a sliding window of (2δ + 1) − nt, as shown in Eq. ( 1), and a sequence can be displayed as: whereas in BCA (B = C, G, or U) motifs, K stands for the nucleotide adenosine and A for the nucleotide next to K. The distance between each nucleotide and the central site K is indicated by the subscript, with A_(−δ) standing for the δ-th upriver nucleotide from the middle, A_(+δ) denoting the δ-th downriver nucleotide from the center, and so on.In this study, the δ value was configured for 20.The sequence fragment is regarded as a positive sample if the m 6 Am site is in the middle of it; otherwise, it is a negative sample.Secondly, sequences with more than 80% similarity were removed using the redundancy removal tool CD-HIT [25], Finally, negative samples were chosen at random in a 1:1 ratio to create the ultimate benchmark dataset, from which 80% of the samples were chosen as the training set and the residual 20% as the independent test set.The dataset used in our work is from the dataset created in three steps by Luo et al. and we did not do additional processing on their dataset.The size of the dataset used is shown in Table 1, the training set contains 1419 positive samples and 1419 negative samples, and the test set has 355 positive samples and 355 negative samples.The ratio of positive to negative samples is 1:1 in both sets of data. (1)

Feature extraction methods
Use of an appropriate sequence coding approach is essential for site identification.
Researchers have created a number of feature encoding techniques to express RNA sequences with numerical vectors since the input of the majority of models is a numerical vector.As an illustration, we consider the encoding based on the structural or sequence information.In this research, we employ three of the most well-applied coding techniques at present, including one-hot, NCP, and ND [26,27].

Nucleotide chemical property (NCP) and nucleotide density (ND))
The chemical characteristics of nucleotides (NCP), which was proposed by Bari et al. [28], based on the construction of chemical structures of nucleotides, have been used in many recent investigations [29][30][31].
1.The nucleotides are classified into these two categories based on the various ring configurations.As purines, A and G are represented by the number 1, whereas pyrimidines, C and U, are denoted by the number 0. 2. According to their chemical functions, nucleotides are separated into two groups: those representing amino groups, designated by the number 1, such as the nucleotides A and C, and those representing keto groups, denoted by the number 0, such as the nucleotides G and U. 3. Based on the different strengths of the hydrogen bond interactions between base pairs, the nucleotides are divided into two groups.One party comprised the strong interactions and is represented by 1, which is A and U, while the other group represents the weak interactions and is represented by 0, which is G and C.
In this way, each nucleotide is represented by the following three-dimensional vector. where represents the interaction strength of hydrogen bonds.
The frequency and location distribution information of each nucleotide can be displayed in the nucleotide density (ND).The density (d i ) of a nucleotide can be calculated as d i = n/i, where n is the number of times the nucleotide appears before the i-th position (2) Hi = x i , y i , z i (including the i-th position).For example, the density of A for the sequence "AGT AUU CA" is 1, 0.50, and 0.375 in the first, fourth, and eighth bits, respectively.Likewise, U is 0.20, 0.33 for positions 5 and 6, respectively, etc.
Combining the chemical properties and density of nucleotides for encoding, then each nucleotide can be encoded as a four-dimensional vector shown in Eq. ( 3).
where l stands for the sequence's length.Each sequence data of length 41 nt can be encoded as a two-dimensional numerical matrix of 41 × 4.

Classification model
It is necessary to select the right model in order to correctly anticipate the m 6 Am sites.As the main network structure in this study, three deep learning models were selected, and stacking was then employed to integrate them for site prediction.

DenseNet
Considering the DenseNet [32], a dense connectivity technique, each network layer is connected to the next by a feedforward connection that transfers data across layers.In order to prevent model overfitting, the gradient disappearance problem is effectively addressed, feature propagation is enhanced, feature reuse is promoted, and the number of parameters is drastically reduced.It has been demonstrated that DenseNet outperforms conventional CNNs because it focuses on both low-level and high-level feature information based on a dense hopping connection mechanism to satisfy the goal of mutual and complementary information transfer.Figure 2 shows the connecting mechanism.
Wang et al. [21] employed DenseNet to predict Lysine Acetylation Sites and achieved great performance.Jia et al. [33] successfully implemented the lysine succinylation sites prediction in the DenseNet network in while utilizing an attention module to assess the significance of various feature information.As a result, we were motivated to use the Fig. 2 Connection of the dense convolutional layer model's denseblock module to record feature data at various levels.The output of all the preceding layers is passed on to the next layer as input by each layer in the DenseNet.Therefore, L(L + 1)/2 connections exist across all layers of a DenseNet with X layers.In other words, Eq. ( 4) represents the output of the network's L-th layer.
where H(•) stands for the non-linear transformation function and eLU and conv1D are included.

SENet
We investigated incorporating attention mechanisms into the model to help the model learn crucial feature knowledge in the principle of earlier studies [21,34].We investigated the individual and joint impacts of channel attention, spatial attention, and Squeeze-and-Excitation Networks (SENet).For instance, adding channel attention at the end of the dense net output and adding SENet attention after each dense block inside the DenseNet constitute the combined application of the attention mechanism.The best results, according to ablation investigations, came from applying compressed attention by itself.
Since the attention layer is directly related to the input matrix, it is important to note that we introduced it before DenseNet.This helps the model pay attention to the underlying data and supports the identification of critical location information to avoid distraction.The results of the experiment showed how valuable an attention layer like this is for improving the accuracy of model predictions.To increase the weight of important information and sharpen the attention on it, SENet is added concurrently after each denseblock to the feature map information output.The standard implementation of SENet looks like this.First, the input feature map is initially subjected to globelAvg-Pooling, which results in the feature map's spatial dimension being compressed.Second, to reduce dimensionality and suit the correlation between channels, a fully connected operation is performed on the compressed feature map.Next, the compressed feature map is connected in its entirety.The downscaled feature map is upscaled in the third phase such that Sigmoid can be used to obtain the normalized weights between 0 and 1.Finally, to create the final feature map weighting, the normalized weights are added to the features of each channel.
Specially, this study's SENet is a modified version of the original SENet.In order to achieve feature map information weighting while reusing the original feature map to prevent information loss, the concept of residual network was specifically used to add the original feature map to the matrix created after compression.

DCNN and BiLSTM
Holschneider et al. [35] initially announced dilation convolution, which added intervals to conventional convolution [36] while maintaining the resolution of the feature map.In contrast to standard convolution, dilation convolution contains an additional parameter called the dilation rate, which is primarily used to describe the size of the dilation.It is a normal convolution when the dilation rate is 1. (4) Wang et al. [23] developed the integrated multiscale deep learning predictor EMDLP, which combined inflated convolutional neural network (DCNN) with BiLSTM for the prediction of RNA methylation.Shortly after, Liu et al. [37] proposed MSNet-4mC, which combined residual network concepts to identify 4mC loci and used DCNN to extract feature map data at various scales.We adapted the joint application of DCNN and BiLSTM to the loci prediction of m 6 Am after being encouraged by the prior studies.Starting with sequence data, the learning function f(x) of m 6 Am divides into four segments: one-hot encoding, dilated convolution to extract features, splicing feature map, and extracting contextual information using BiLSTM as shown in Eq. ( 5).
The formula for the dilated convolution in a one-dimensional circumstance is given in Eq. ( 6).Different dilution rates can be thought of as adding different of blank gaps among each convolution kernel.where x i is the i-th input item, y i is the i-th DCNN array's output, ω is the filter's weight, N is the filter's longer, and k is referred to by the dilution rates (DR).
The dilution rate for each of the three blocks of DCNNs in the DCNN stage is 1, 2, and 3, the information of the feature map is learned from three different sensory fields.And, each DCNN block consists of a max-pooling layer with a dropout unit, a dilated convolutional layer with the rectified linear unit (ReLU) as its active function, and a dilated convolutional layer.

MSRN and BiGRU
To find the circRNA-RBP interaction locations, Niu et al. [23] employed a deep multiscale residual network and a network made up of BiGRU with a self-attention mechanism.Both MSRN and BiGRU can effectively represent high-level features and are competent at learning local and global contextual information.Thus, aiming at predicting the m 6 Am sites, we attempted to cascade multiple CNNs of various scales and use them in conjunction with BiGRU.The residuals are introduced in the CNN cascade to ensure that important feature map features are not lost during the convolution phase.So as to prevent information loss, the model also includes BiGRU, which handles the longterm dependencies of the sequences and maintains important aspects utilizing a range of gating functions.
First, we extracted features from a one-hot encoded feature map using six cascaded multi-scale residual blocks (MSRBs), each of which consists of three convolutional layers and 64 convolutional kernels.The output of each MSRB is then integrated to produce global feature fusion.A further 1D convolution and average pooling procedure is then carried out with 192 convolution filters.
Second, to improve the recognition performance of the model and to make up for the fact that the multiscale residual network can only extract sequence correlation (5) and not context-linked information, the output of the deep multiscale residual network is fed into BiGRU to obtain contextual information [38].

Stacking ensemble
In general, a prediction task will produce the different prediction outcomes in distinct predictors.Ensemble learning can combine numerous classifiers to obtain the prediction results that are superior to those by a single classifier.Among them, stacking is a key data science technique that depends on the outcomes of numerous models.By the stacking approach, the prediction outcomes of various models are trained once again as features, and the resulting prediction consequences frequently beat those of a single strong model.
DenseNet, DCNN, and MSRN were the three classifiers we intended to use in this study as the basic classifiers in the first layer.In particular, in order to prevent the information loss and overfitting, the outputs of the first-layer's three classifiers were mixed with the original dataset as the input for the second-layer's classifier.Here, the second-layer classifier that we used was a straightforward logistic regression model.Figure 3 displays the structural framework of the stacking ensemble model.

Performance evaluation
The foundation for evaluating the effectiveness of various models is the employment of common and scientific measures.The following four metrics are commonly used to predict performance for binary classification problems [39,40] in general machine learning, as represented by the following metric equations: Fig. 3 The stacking ensemble model scheme where respectively, TP, TN, FP, and FN stand for the quantity of incorrect predictions in the positive classification, the quantity of incorrect predictions in the negative classification, the number of accurate predictions in the positive category, and the number of correct predictions in the negative category.In addition, the model's efficiency is often evaluated using the area under the receiver operating characteristic curve (AUROC) and the area under the precision and recall curves (AUPR).

Contrasting various feature extraction techniques
Regarding our experiments in the feature extraction section, we use the three feature encoding techniques one-hot, NCP, and ND.The majority of studies [15,18,20] use NCP in conjunction with ND.There are three possible combinations in this manner: one-hot alone, NCP and ND combined, and one-hot and NCP, ND combined.As indicated in Tables 2 and 3, we compare the performance of different feature extraction approaches by performing fivefold cross-validation on the training set and independent tests on the test set, where the best predicted results are shown in bold.
We used one-hot coding as the only coding method for RNA sequence data as the model input data, because Tables 2 and 3 show that when one-hot coding alone was used as the feature coding method, and the important evaluation indexes like Sn, Sp, ACC in cross-validation and independent testing were higher than other individual or combined coding methods.

Comparison of models with different denseblocks
The parameters of the predictor are optimized in DenseNet by setting different numbers of denseblocksand the performance of the model is compared by ACC and MCC values.
It can be clearly seen in Fig. 4 that the predictor performs best when DenseNet picks 6denseblocks.

Model architecture ablation experiment
When choosing models, ablation studies were carried out to evaluate which of the three models-or which conjunction of the three-performed better.Table 4 and Fig. 5 display the predictions made by the three deep learning models alone and in conjunction.Similar to Table 4, the mark "√" in the row of every deep learning model denotes the model's selection for this study; meanwhile, the absence of mark "√" (i.e.blank)-denotes the unselecting of the prediction model.Table 4 and Fig. 5 clearly show that when employing stacking to combine the three deep models DenseNet, DCNN, and MSRN, ACC and MCC for the model were greater than those of the other cases on the train and test sets.Therefore, to extract features from the original data and forecast the m 6 Am sites in this study, we integrated DenseNet, DCNN, and MSRN using stacking.

Comparative analysis of other models
We compared the performance of EMDL_m6Am with a number of representative models, including the conventional machine learning model SVM, XGBoost and the deep learning model CNN and BiLSTM.The results of the fivefold cross-validation of various models on the training set were shown in Fig. 6.This demonstrated that the deep learning model outperformed machine learning in the extraction of features from huge datasets.Additionally, it was clear that EMDL_m6Am exceeds other models in terms of accuracy when predicting the m 6 Am sites.To assess the prediction performance of EMDL_m6Am, we alse compared EMDL_ m6Am with several advanced models for analysis, including: Cross Stage Partial DenseNet (CSPNet) [41], VGG-16 [42], ResNet [43], VGG-19 [44], Inception V3.These models performed differently on the independent test set, as shown in Table 5.Among them, VGG-16 scored the highest in Sn, but the Sp score was too low, lost the balance between Sn and Sp, and had too much deviation for high prediction accuracy.In contrast, our model EMDL_m6Am achieved a balance between Sn and Sp with a deviation of less than 2.25%, and obtained the highest Sp, MCC, Acc, AUC, AUPR, Pre, and F1 scores among several models, yielding the best prediction results overall.To clearly demonstrate the superiority of EMDL_m6Am, we plotted Fig. 7 to show the performance of several models.We even farther confirmed the effectiveness of EMDL_m6Am by contrasting it with other present predictors for predicting m 6 Am sites in RNA sequences, including MultiRM, m6AmPred, and DLm6Am.The four models all used the same dataset to ensure the validity of the experimental comparison.Tables 6 and 7 presented the findings of the fivefold cross-validation of the four models on the training and test sets.It was evident that EMDL_m6Am had significantly better prediction performance on the training set than the other three models, and that it had 7% more Sn, 10% more Sp, 7%-8% more ACC, and 7%-8% more MCC than the most recent model, DLm6Am.When used to balanced datasets, AUPR is less accurate and is less effective and informative for dichotomizing since it is susceptible to sample distribution.Table 7 showed that EMDL_m6Am performed better than the other three models on the independent test set, which also indicated that EMDL_m6Am had good generalization ability.A direct viewing comparison of the two predictors was shown in Fig. 8.This demonstrated that EMDL_m6Am outperformed the current state-ofthe-art model DLm6Am.

Table 7 Comparison of different predictors on test dataset
The conclusions were from the research, as stated by the asterisk (*) [20] The best experimental results are shown in bold To test the robustness of the model and to evaluate the performance of the model on unbalanced datasets, we used the full transcript dataset and the mature RNA dataset from m6AmPred [18] on m 6 Am.It is worth noting that we used the original unbalanced dataset, in which the full transcript dataset contains 2447 positive samples and the mature RNA dataset includes 1673 positive samples, with the ratio of positive to negative samples being 1:4.The full transcript dataset and the mature RNA dataset were randomly divided into a training set and a test set in the ratio of 8:2.Our model EMDL_ m6Am without any non-equilibrium processing on the dataset, only used the network framework structure of the model for feature extraction and weight assignment to get the final prediction results.Finally, the comparison of the results of EMDL_m6Am and m6AmPred on the two datasets was shown in Table 8.
As shown in Table 8, on the full transcript dataset, the Sn, ACC and MCC values of EMDL_m6Am were higher than those of m6AmPred, and only the Sp values were all slightly lower than it.The same was true on the mature RNA dataset.Collectively, our model EMDL_m6Am outperformed m6AmPred, and also illustrated that EMDL_m6Am performed equally well on the unbalanced dataset with good generalization ability.
In order to verify the robustness of EMDL_m6Am, we randomly selected negative samples several times to conduct five tenfold cross-validation experiments.The results of the experiments were shown in Fig. 9, and the results of the five tenfold cross-validation experiments were similar without too much difference.This indicated that EMDL_ m6Am had stable experimental results for selecting different sets of negative samples.

Sequence analysis of m 6 Am sites and T-SNE visualization of EMDL_m6Am
In this study, we explored the frequency of occurrence of 40 nucleotide bases around the m 6 Am site on RNA sequences to find potential consensus motifs for the sequences.We used an efficient tool, TWO Sample Logo [45], to discover position-specific sign composition differences at the m 6 Am site.In this study, adenine (A) is at the center of the RNA sequence fragment with 20 nucleotides both before and after it.The experimental results were shown in Fig. 10.Nucleotides such as C(cytosine) and U(uracil) were highly represented near both m 6 Am site and nonm 6 Am site, and both appeared in the left position, which was common to both.However, the frequencies and positions of nucleotides at other positions of m 6 Am site and non-m 6 Am site were different, for example, G (guanine), C (cytosine) and U (uracil) were in high proportions near the right side of m 6 Am site.The frequencies of each nucleotide at the front end of the sequence and the back end of the sequence of non-m 6 Am site were similar to the extent that their expression was not obvious.Such analysis indicated that the distances and frequencies between different nucleotides in the sequence played a crucial role in distinguishing m 6 Am sites from nonm 6 Am sites.
In addition, we used t-distributed stochastic neighbor embedding (t-SNE) to visualize the two data features.One wasthe projection of the features by one-hot coding  11b).As can be seen from the Fig. 11, when no prediction is made using EMDL_m6Am, it is not possible to distinguish between positive and negative samples.Conversely, the separation boundary between m 6 Am sites (red data points) and non-m 6 Am sites (blue data points) is very clear after extracting significant features using the three models, which indicates that the proposed EMDL_m6Am has high predictive performance.

Conclusions
M 6 Am plays a key role in the regulation of RNA and in the identification and treatment of obesity genes and some cancers, therefore, it is crucial to develop predictors that can help detect m 6 Am sites.In this work, we proposed a new predictor, EMDL_m6Am, by stacking three deep learning models together to perform m 6 Am site detection in RNA sequences.It has several advantages over previous studies such as (1) EMDL_m6Am does not need to contain a complex feature extraction process like traditional machine learning, it feature coding is simple and advanced feature extraction is done by the model framework.(2) Traditional CNN cannot extract complex features, in this study, we take an ensemble approach by stacking three powerful deep learning models to extract useful features from different dimensions.Also, we compared multiple traditional machine learning models, advanced deep learning models, and different predictors to highlight the predictive performance of EMDL_m6Am.
In addition, EMDL_m6Am may help to predict other more RNA post-modification sites as a way to discover their novel functions.In future studies, it may be possible to eliminate some of the modules with overlapping functions in the three models to simplify the models, or to explore some of the RNA post-modification sites with unspecified functions.

Fig. 4
Fig. 4 Comparison of ACC, MCC values for predictors with different number of denseblocksin DenseNet

Fig. 5 Fig. 6
Fig. 5 Performance evaluation of seven classifiers with three different model combinations through test sets

Fig. 10 A
Fig. 10 A two-sample logo of m 6 Am sites and Non-m 6 Am sites with L = 41

Fig. 11 a
Fig. 11 a Two-dimensional t-SNE visualization of training data with one-hot encoding.b Two-dimensional t-SNE visualization of the learned features from training data by EMDL_m6Am

Table 1
Distribution of the benchmark data set

Table 2
Ablation studies of the feature encoding ways on train set

Table 3
Ablation studies of the feature encoding ways on test set

Table 5
Performance of several advanced models on independent test sets Comparison of the prediction performance of EMDL_m6Am and five advanced models on an independent test setComparison of different classifiers

Table 6
The fivefold cross-validation performance of different predictors

Table 8
Performance comparison of m6AmPred and EMDL_m6Am on independent test sets Fig. 9 Results of five tenfold cross-validation